Materials+ML Workshop Day 4¶
- Data Manipulation:
- The Pandas Package
- Working with DataFrames
- Visualizing Data
- The Matplotlib package
- Visualizing 1D data
- Visualizing 2D and 3D data
Tentative Week 1 Schedule:¶
Session | Date | Content |
Day 1 | 06/09/2025 (2:00-4:00 PM) | Introduction, Python Data Types |
Day 2 | 06/10/2025 (2:00-4:00 PM) | Python Functions and Classes |
Day 3 | 06/11/2025 (2:00-4:00 PM) | Scientific Computing with Numpy and Scipy |
Day 4 | 06/12/2025 (2:00-4:00 PM) | Data Manipulation and Visualization |
Day 5 | 06/13/2025 (2:00-4:00 PM) | Materials Science Packages, Introduction to ML |
Review: Day 3¶
Numpy Package¶
- Numpy supplies mathematical functions (such as
sin(x)
,exp(x)
, etc.) - Numpy arrays (
numpy.ndarray
) are multi-dimensional data structures - These arrays can represent vectors, matrices, tensors, etc.
- Creating Numpy arrays:
In [1]:
import numpy as np
# create a 1D array:
x = np.array([1.0, 2.0, 3.0, 4.0])
print(x)
# create a 2D array (matrix):
X = np.array([
[1,2,3],
[4,5,6],
[7,8,9]
])
print(X)
[1. 2. 3. 4.] [[1 2 3] [4 5 6] [7 8 9]]
- Every array has an instance variable
shape
- The length of the tuple is the dimension of the array
- The entries in the tuple represent the size of the array along each axis (i.e. dimension)
In [2]:
# x is a 1D array of length 4:
print(x.shape)
# X is a 3x3 matrix:
print(X.shape)
# create an array of zeros with a 3x2x2 shape:
S = np.zeros((3,2,2))
print(S.shape)
(4,) (3, 3) (3, 2, 2)
- Numpy arrays can be indexed like Python lists, but with some added features:
In [3]:
X = np.array(range(1,10)).reshape((3,3))
print(X)
# access row 0:
print('Accessing X[0]:')
print(X[0])
# access row 0, column 2:
print('Accessing X[0,2]:')
print(X[0,2])
# access column 0:
print('Accessing X[:,0]:')
print(X[:,0])
[[1 2 3] [4 5 6] [7 8 9]] Accessing X[0]: [1 2 3] Accessing X[0,2]: 3 Accessing X[:,0]: [1 4 7]
- All math operations on arrays are performed elementwise
- Numpy support matrix multiplications with the
@
operator
In [4]:
A = np.array(range(1,5)).reshape(2,2)
D = np.diag([1,2])
print('A:\n', A)
print('D:\n', D)
# elementwise addition:
print(A + D)
# matrix multiplication:
print(A @ D)
A: [[1 2] [3 4]] D: [[1 0] [0 2]] [[2 2] [3 6]] [[1 4] [3 8]]
- One important
numpy
function we will use a lot today isnp.linspace
:
In [5]:
start = 0.0
end = 10.0
n_pts = 11
# create a 1D array of uniform points:
x_pts = np.linspace(start, end, n_pts)
print(x_pts)
[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.]
Scipy Package¶
- Scipy provides many useful subpackages for scientific computing
- Subpackages you may find useful include:
scipy.constants
: physical constants, unit conversionsscipy.optimize
: functions for optimization and root findingscipy.integrate
: functions numerical integrationscipy.stats
: statistical analysis functionsscipy.special
: special functions (e.g. Bessel functions)
New Content:¶
- More Python packages:
- Pandas ("Panel Datasets")
- Matplotlib ("MATLAB-like plotting library")
Checking if Packages are installed¶
- The quickest way to check if a package is installed on your system is to import it:
In [6]:
import matplotlib
In [7]:
import pandas
Installing Pandas:¶
In [8]:
!pip install pandas
Requirement already satisfied: pandas in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (2.2.3) Requirement already satisfied: tzdata>=2022.7 in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (from pandas) (2023.3) Requirement already satisfied: pytz>=2020.1 in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (from pandas) (2023.3) Requirement already satisfied: python-dateutil>=2.8.2 in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (from pandas) (2.8.2) Requirement already satisfied: numpy>=1.22.4 in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (from pandas) (2.2.6) Requirement already satisfied: six>=1.5 in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)
Installing Matplotlib:¶
In [9]:
!pip install matplotlib
Requirement already satisfied: matplotlib in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (3.10.3) Requirement already satisfied: packaging>=20.0 in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (from matplotlib) (23.1) Requirement already satisfied: pillow>=8 in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (from matplotlib) (9.5.0) Requirement already satisfied: numpy>=1.23 in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (from matplotlib) (2.2.6) Requirement already satisfied: pyparsing>=2.3.1 in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (from matplotlib) (2.4.7) Requirement already satisfied: fonttools>=4.22.0 in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (from matplotlib) (4.39.4) Requirement already satisfied: cycler>=0.10 in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (from matplotlib) (0.11.0) Requirement already satisfied: python-dateutil>=2.7 in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (from matplotlib) (2.8.2) Requirement already satisfied: contourpy>=1.0.1 in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (from matplotlib) (1.0.7) Requirement already satisfied: kiwisolver>=1.3.1 in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (from matplotlib) (1.4.4) Requirement already satisfied: six>=1.5 in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)
Pandas¶
- Pandas is an open-source Python package for data manipulation and analysis.
- It can be used for reading writing data to several different formats including:
- CSV (comma-separated values)
- Excel spreadsheets
- SQL databases
- We can import pandas as follows:
In [10]:
import pandas as pd
DataFrames¶
- Pandas introduces the
DataFrame
type for manipulating data - We can create DataFrames from Python dictionaries as follows:
In [13]:
# Data on the first four elements of the periodic table:
elements_data = {
'Element' : ['H', 'He', 'Li', 'Be'],
'Atomic Number' : [ 1, 2, 3, 4 ],
'Mass' : [ 1.008, 4.002, 6.940, 9.012],
'Electronegativity' : [ 2.20, 0.0, 0.98, 1.57 ]
}
# construct dataframe from data dictionary:
df = pd.DataFrame(elements_data)
Tutorial: Working with Pandas DataFrames¶
- Accessing Dataframe columns
- Filtering Dataframes
- Transforming Data
- Importing and exporting data
Exercise: Working with Pandas DataFrames¶
- Exploring the Periodic Table
- Download the Periodic Table CSV file.
- Answer the following questions:
- What fraction of elements in the Periodic Table were discovered before 1900?
- Which elements have at least 100 isotopes?
- What is the average atomic mass of the radioactive elements?
Matplotlib¶
- Matplotlib is a MATLAB-like plotting utility for creating publication-quality plots
- In matplotlib, we typically import the
pyplot
subpackage with the aliasplt
:
In [12]:
import matplotlib.pyplot as plt
Tutorial: Plotting with Matplotlib¶
- Plotting 1D data
- Styling plots
- Adding axes labels, titles, legends
- Typesetting
- Plotting in 3D
- Saving figures
Exercises: Plotting with Matplotlib¶
- Histograms
- Chaotic Dynamical Systems
Recommended Reading:¶
- Materials Science Python Packages
Bring your questions to our next meeting tomorrow!